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Abstract 

For extreme value estimation we propose to use a 
model with a Dirichlet process mixture of gamma 
densities in the center and generalized Pareto den- 
sities for the tails. Due to the randomness in the 
center and a heavy tailed density in the tails density 
estimation and posterior inference for high quan- 
tilcs arc possible. The approach can be used in a 
"default" manner on the positive reals because it 
works when prior information is unavailable. The 
proposed model can be easy to implement and a 
sensitivity analysis is provided. We applied the pro- 
posed model for simulated and real data sets. 

keywords Generalized Pareto Distribution, 
Threshold Estimation, Dirichlet Process Mixture. 

1 Introduction 

In this paper we use two important reference to 
build the model we are proposing. The first is a 
model proposed by Ferraz F, Gammerman D and 
Freitas H. (2011) and the second is the Dirichlet 
process mixture of gamma densities proposed by 
Hanson (2006). The combination of the two pro- 
posals in a single model can be a powerful tool to 
density estimation in the center of the distribution 
and to extreme value estimation in the tails of the 
distribution. This paper is organized as follow. Sec- 
tion 2 is devoted to present the model and algorithm 
of our proposal. In Section 3 we present a sensi- 
tivity analysis. In Section 4 we present simulated 
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examples for the CDF. Section 5 shows posterior 
predictive density estimation. Section 6 displays 
the application in this paper: river flow levels in 
Guarabo Puerto Rico. Finally in Section 7 we have 
the conclusions and some important remarks. 

2 Model 

In this section we show the proposed model. We 
follow the proposal of Ferraz F, Gammerman 
D and Freitas H. (2011) to build the underline 
density. Ferraz F, Gammerman D and Freitas H. 
(2011) contemplate a model that incorporates a 
nonparametric specification for the center of the 
distribution and a parametric approach for the 
tails. They consider a mixture of gammas densities 
for the center of the distribution following the 
proposal of Wiper M, Insua D.R and Ruggeri F. 
(2001) who propose to put prior probabilities on the 
number of components of the mixture and to use 
the reversible jump algorithm to obtain inference. 
In this work we use a Dirichlet process mixture pro- 
cess of gamma densities (DPMG) accommodating 
a very wide variety of shapes and spreads. Hanson 
(2006) found that the using DPMG improve the 
density estimation to univariate densities on the 
positive reals. Also, the reversible jump approach 
allows for very precise information on the number 
of components in the mixture. However not always 
prior information is available and in it is more 
remarkable when we have extreme values in the 
density. Another advantage of the Dirichlet Process 
Mixture is that it does allow control the expected 
number of components (Antoniak C. E. (1974)). 
Also, because the cumulative distribution function 
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is random in the DPMG allows we can do posterior 
inference for the underline cumulative distribution 
function. 

Let the vector of parameters $ = (^,(t, m). The 
density of the Generalized Pareto Distribution with 
scale parameter a and shape parameter 7 is as fol- 
low: 



9{x[ 



iexp(-(a;-u)/a) if ^ = 

(1) 

where (j) = {u, A) and cc — M>Ofor^>0 and 
< a; - M < -(f)/C for ^ < 0. We have the GDP is 
bounded from below by u, is bounded from above 
by u — cr/^ if ^ < and unbounded from above if 
^ > 0. Pickands J. (1975) shows that using a GPD 
for the tails in a distribution in the positive real 
line we can expect to obtain better results than only 
consider a single density. According to the Pickands 
J. (1975) theorem we can use in the center of a 
distribution a gamma mixture of densities (because 
they belong to the domain of attraction Section 1. 
Theorem Pickands J. (1975)) and in the tails of the 
distribution a GPD. 

Therefore the density proposed can be written: 



arbitrary measurable partition, Bi, B2, B^. of 
the joint distribution of (G(Si), G(B2), G(Sfc)) 
is Dirichlct {aGo{Bi),aGQ{B2),...,aGa{Bk)) 
where G{Bi) and Go{Bi) denote the probability of 
set (Bi) under G and Go respectively. Here Go is 
a specific distribution on and a is a precision 
parameter. Let K{;,6) be a parameter family of 
distributions functions (CDF's) indexed by 6 e Q, 
with associated densities k{;9). If G is proper we 
define the mixture distribution 



F{.;G) = I K{;,0)G{de) 



(4) 



where G{d9) can be interpreted as the condi- 
tional distribution of 9 given G. We can express (5) 
as f{-\G) = J k{.;G) differentiating with respect 
to (.). Due to G is random then F{.; G) is random. 
Sometimes we are interest in functionals of i^(.;G) 
we denoted these as Q{F{.;G)). 

It is consider F{.; G) as the model for the stochas- 
tic mechanism corresponding to xi, 2:2, x„ as- 
suming Xi given G i.i.d. from F{.-.G) with the DP 
structure. We will work with a DP mixture model of 
gamma densities therefore the introduction of mix- 
ing parameters (Ai,7i) = associated with each Xi. 
The model can be expressed in hierarchical form 
such as: 



f{x\<i>,e) 



h{x\6) X < u 

[1 - H{u\e)]g{x\^) x>u 



(2) 



where </) = (u,^, A), 6 = (A, 7) and H{u\0) de- 
notes the cumulative distribution function (CDF) 
of h{x\9) at u 

(The density (3) was the proposal in Ferraz F, 
Gammerman D and Freitas H. (2011)). The cumu- 
lative distribution function of (2) is as follow: 



Xi\\i,^i,ur^ h{xi,6i), i = l,..,n (5) 
eilG-^G, i = l,..,n 

G\a,r^^DP{a,Go),Go = Go{.\v) 
a,r] ^ p{a)p{r]) 

here 6i = (Xi, ji) and h(xi, 6i) denotes the gamma 
density with the scale parameter, Aj , and the shape 
parameter, 7i, as follow: 



F{x\ct>,9) = 



H(x\e) x<u 
H{u\e) + [1- H{u\e)\G{x\it>) x>u 

(3) 

where G{x\(l)) is the CDF of GPD. Note 

that limx >u-F{x\(l),9) = H{u\9) and 

lima; ^u+F{x\(j),0) = H{u\9) therefore (3) is 

continuous at u. The choice in this paper for the 
central part in (2) is to use a DPMG. Therefore 
following Ferguson (1973) a distribution G on O 
follows a dirichlet process DP{a,Go) if, given an 



M^i|Ai,7i) = ^^^"' 



exp{-7ja;i} Xi > (6) 



Following Hanson (2006) we use for 5o(A, 7|ry) two 
independent exponential distributions: 



fl'o(A, 7I??) = ax exp{-axX)a^ exp(-a^7) (7) 

with T] = {ax, a-y). For the hyperparameters of (5) 
two flexible gamma priors ax ^ ^{bx, ca) and a^ ~ 
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r(6^,c^), with r(a, 6) denotes the gamma density 
with hyperparameters a and b. 

Now we need to find the posterior inference 
for the threshold u, the scale parameter a and 
shape parameter ^ in the GPD. So we propose to 
use a metropolis algorithm for these parameters. 
The prior distribution for w is a normal density 
A^(m„,cr^) as suggest Behrens C, Gammerman D. 
and Lopez H. (2004). Castellanos E and Cabras S. 
(2007) obtained the Jeffreys non-informative prior 
for (cr, ^) and they found using this prior the poste- 
rior distribution is proper . The prior is the follow- 
ing: 



p - Hiu\\,j) 



(12) 



p(a,0«^~'(l+0"'(l + 20"'/' 



(8) 



" l-H{u\X,j) 

therefore we need to solve G{q\(j>) — p* in 
order to find the p quantile, q. We can use 
the posterior predictive distribution in the DPMG 
to compute approximations of H{q\X, 7) with q < u. 

Figure 1. displays the model (3). This model al- 
lows for a discontinuity of the density at the thresh- 
old. Continuity constrains is basically solve defining 
adequate models for the posterior analysis. CDF's 
are display in Figure 2 we can see the continuity in 
the functions. 



where ^ > —0.5 and fi > 0. According to Coles 
S. and T. Jonathan (1996) situations were ^ < —0.5 
are very weird in practice. 

The posterior distribution using the density (3) 
is then: 
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Pi0,^x)^l[h{x\9)ll{l-Hiu\e))- (l+e- 

A B a \ a 

(9) 

for ^ 7^ and 
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p{0, ^\x) cx Y[ h{x\9) - H{u\d))-ejip{-{x - u)/a) 

A B ^ 

(10) 
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X p{u)p{i)p{a)p(\,-i\r\) 

for C = 0. With A = {x; : < u} and B = 
{xi : Xi> u} 

Using the model proposed we can do extreme 
value and density estimation at the same time. For 
example in order to find values beyond the thresh- 
old we have that 



Figure 1: Probability density function of the model 
(3) for a number of parameters values: (a) ^ — —0.4 
and cr = 3, (b) f = 0.4 and cr = 3, (c) ^ = -0.4 and 
cr = 4 and (d) ^ = 0.4 and cr = 4, threshold u = 11 
and the center of the densities is a mixture of two 
gamma densities the tails are modelling with GPD. 



F(x|0, A,7) = H{u\X,-i) + [1 - H{u\\n)]G{x\<j,) 

(11) 

where G{x\(j)) is the CDF of the GPD. So we can 
find very easy high quantiles beyond the threshold 
for example the p quantile can be found using 
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Figure 2: CDF of the model (3) for a number of 
parameters values: (a) — —0.4 and cr — 3, (b) 
^ = 0.4 and cr = 3, (c) ^ = -0.4 and ct = 4 and (d) 
^ = 0.4 and a — 4, threshold u = 11 and the center 
of the densities is a mixture of two gamma densities 
the tails are modelling with GPD. 



Figure 3: Posterior CDF with fixed a — 1 using the 
original definition of the DP the dashed red lines is 
the 95% credible interval of the estimates and the 
red solid line is the point estimate. The black line 
is the true CDF and the solid black the simulated 
CDF for the GPD. 



3 Sensitivity analysis and sim- 
ulations 

In this section we consider simulations and poste- 
rior inference. First we need to consider the hy- 
perparameters in the DPMG. a indicates a priori 
the number of components we have in the mixture. 
According to Hanson (2006) a mixture of gammas 
densities with more than eight components would 
not be identifiable. The author consider values such 
as 0.1, 1 and he obtains excellent results. The result 
is important because not always prior information 
about a is available. In order to explore the preci- 
sion for the DP here we consider a random follow- 
ing a distribution r(2, 2) where the expect number 
of components is approximately 4 (Escobar M. and 
West M. (1995)). On the other the hyperparame- 
ters of go can be expressed in terms of the mean 
fi = A/7 and variance V = of h{x\d) such as 

(see Hanson (2006)): 




Figure 4: Posterior CDF with a ^ r(2, 2) using the 
original definition of the DP the dashed red lines is 
the 95% credible interval of the estimates and the 
red solid line is the point estimate. The black line 
is the true CDF and the solid black the simulated 
CDF for the GPD. True threshold is the rect red 
4 line and simulated threshold is the rect black line. 



f{ii\ax,ay) 



f{V-^\ax,a^)^r{2,ax^l^ + a^^l) (13) 
both densities are diffuse. An important remark 



we found is when ax 



1 



/(m|i,i) = 



1 

(1+7^ 



(14) 



which is the Beta Prime distribution with hyper- 
paramters of scale and shape equals to 1. In recent 
works the Beta prime distribution has been used 
as default density for modelling the variance in Dy- 
namic linear models, Hierarchical and Robust infer- 
ence in Bayesian analysis. Therefore we can think 
that we are modelling the mean of the mixture of 
gammas densities in a non informative (but robust) 
fashion. 

4 Results for the CDF and pa- 
rameters in the GPD 

We use a sample size n = 100 in order to know our 
model is useful in practice with small sample sizes. 
Hanson (2006) showed that using the DPMG with 
different specifications for a and large sample sizes 
1000 and 10000 the model works. We have that 
^ = 0.4, a — 3 and the threshold m = 11 at the 
90% quantile. The simulated mixture density for 
the central part is: 
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Figure 5: Posterior distribution of u, H{u\d), ax, 
and the expected number of clusters, n* when 
a is random. 



Figure 5 has the hyperparameters of go as also the 
posterior inference for u when a is random we can 
see the prior and posterior distribution for a are 
near. The value H(u\9) is approximately to the real 
value 0.9. And the posterior mean of the threshold 
is approximately 10. 



h{x) = 0.5r(x|10,4) -^0.5r(x|6,0.7) (15) 

The hyperparameters for ax and a^ are bx — b-y — 
cx = — 0.001 in order to be non informative in 
the hyperparameters of Go. For the threshold u 
the prior density is centering in the actual value 
and the variance cr^ gives 99% of probability in the 
range between 50% and 99% of the data. In the 
first simulation we are interested in the estimation 
of the CDF's using DPMG therefore we fixed ^ and 
a and we are looking for inference for the threshold 
and the CDF. Figure 3 and Figure 4 display the 
results. We can see the simulated CDF's are ap- 
proximately the same to the underline CDF. The 
results are similar with a random or non random. 
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5 Results for the posterior 
predictive density and GPD 
parameters 



Here a — 1 therefore the number of expected com- 
ponents is 5. 10000 MCMC shnulations were used 
and we burn the first 8000 simulations. Figure 6. 
displays the quality of the approach even for small 
sample sizes the approach works. Figure 7 shows 
that the tails in the GPD works and it reproduces 
the underline density in the tails. Is important to 
consider that we use a small sample n=100 because 
we need our model works with any large or small 
sample sizes. The density estimation for the central 
part in our model is improved when large sample 
sizes are considered (see Hanson (2006)). Figure 8 
and Figure 9 show the predictive quantiles at 95% 
and 99%. We can see that for quantiles at 95% and 
99% the posterior quantile predictive densities mod- 
elling the quantiles accurately. Figure 10 shows the 
posterior distribution of the parameters m, a and ^. 
We can see there are no problems with the conver- 
gence of the posterior estimates and the posterior 
distribution represents nicely the true parameters. 
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Figure 7: Dashed red line: Posterior predictive den- 
sity using the Dirichlet process mixture of densities 
in the central part and the posterior predictive dis- 
tribution using GPD in the tails. Full black line the 
true density. 




Figure 6: Dashed red line: Posterior predictive den- 
sity using the Dirichlet process mixture of densities 
in the central part and the posterior predictive dis- 
tribution using GPD in the tails. Full black line the 
true density 



Figure 8: Posterior histogram of the 95% quantile 
for the simulation. Red line the true quantile. 
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6 Application of the proposal 
to the levels of flow in the 
river Gurabo 

The river flow levels are important measure to pre- 
vent damage in populations. Therefore we applied 
our proposal in the river flow levels measured at 
ft^/s in the river Gurabo at Gurabo Puerto Rico. 
The data is free available at waterdata.usgs.gov. 
We monitoring the flows between December 2 2012, 
12:00 am to December 4 2012, 8:45 pm. The mea- 
sures are made each 15 minutes. We have a sample 
size of n=254. 

Figure 11 displays the posterior histograms for 
the GPD parameters and we see all parameters con- 
verge. Figure 12 shows the posterior distribution 
for the 99.9% high quantile both the maximun value 
and the posterior mean for the quantile at 99.9% are 
the same. This result is useful because this remarks 
the coherence of the model. Also the posterior dis- 
tribution is asymmetric which is expected. Figure 
13 displays all paths of the posterior random den- 
sity using the DPMG. For the tails we have the good 
approximation using the GPD. Figure 14 displays 
the average of the posterior predictive density us- 
ing the DPMG and for the tails we have the GPD. 
We can see our proposal reproduces the data in the 
center of the density and in the tails. According to 
the posterior analysis (based in the last two days) 
with a probability of 0.1% we can see values bigger 
than 1888 ft^/s in the Guarabo River. 



Figure 9: Posterior histogram of the 99% quantile 
for the simulation. Red line the true quantile. 
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Figure 10: Posterior histogram for the GPD param- 
eters for the simulation. 
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Figure 11: Posterior histogram for the GPD param- 
eters for the apphcation. 
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Figure 12: Posterior histogram of the 99.9% quan- 
tile for the apphcation. Red hne the maximum ob- 
served data. 



Figure 13: Paths of the posterior predictive density 
using the dirichlet process mixture of densities in 
the central part and predictive distribution using 
GPD in the tails. Dashed red line is the contin- 
uation in the tails with the GPD. The histogram 
display the real data. The dashed line display the 
posterior mean of the distribution of u. 

Histogram of y 



500 



1000 



1500 



Figure 14: Posterior predictive density using the 
dirichlet process mixture of densities in the central 
part and predictive distribution using GPD in the 
tails. Dashed red line is the continuation in the tails 
with the GPD. The histogram display the real data. 
The dashed line display the posterior mean of the 
distribution of u. 



7 Conclusion 

In this paper we propose to use a model with 
a Dirichlet mixture process of gamma densities 
for the center of the distribution and a heavy 
tailed Generalized Pareto Distribution for the 
tails. The proposal works and we can applied 
the proposed model to small and large sample 
sizes. The posterior inference estimation for the 
center of the distribution is made using a polya 
urn scheme with a the original definition of the 
DP. For the parameters of the GPD in the model 
we use three steps with the the metropolis Hasting 
algorithm. The proposal in this paper works well 
and posterior inference for the proposed model 
can be make. In order to use the proposed model 
even small sample sizes can be considered. Finally 
we can extended the proposed model to different 
areas in Bayesian Statistics such as Clinical Trials, 
Dynamic Financial Models and Linear Regression. 
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A MCMC scheme 

For the DPMG we need to compute h{x\9), H{x\0) 
and therefore H{u\6) therefore in order to do poste- 
rior inference we consider the polya urn expression 
in the DPMG to compute posterior realizations for 
the density h{x\6) and the original DP definition to 
compute posterior realizations for the CDF H{x\0) 
and hence H{u\9). 

Let {61, ...,^*. } be the unique values of 6i, coi = 
j if and only ii 9i = 9* i = 1,2, ,...,n and nj = 
\{i : u>i = j}| and j = 1,2, ...,n* with n* number 
of distinct values. We use the following transition 
probabilities: 

1. Polya urn: marginalized G (using ~ to indicate 

summaries without uji) and defining a specific 
configuration {wi, .., w„} with transition prob- 
abilities: 



p{iOi=t\oj.,)<xh^ { ^V"""*"' (16) 
la J = n +1 

2. Resampling cluster membership indicators Wj: 



9 



=j\,...,Xi) oc 



Uj k{xi;9*] 



j = l,...,n*" 

aJkixi;ei)dGoiei\rj) j = n*' + 1 
(17) 

where we use the close results in Hanson 
(2006): 



k{xi;e*) = h{xi\e*) 



I 



k{xi;9i)dGo{0i\n,T^) 



(18) 



(19) 



a\aj 



Xi{xi + ax){ax - log{xi/{xi + a^))y 



(20) 



with probability proportional to n~k{xi; 9*) we 
make 9i = 9*~ . On the other hand with prob- 
ability proportional to a J k(yi]9i,(f))dGo{9i\ri) 
wc open a new component and we sample 
9i = (Ai,7i). First we sample Xi\r] ~ r(2, oa — 
\og{x i/{xi + a^))'^) then we sample 7i|Aj,r7 
r(Ai + l,Xi + a^). 

3. Hypcrparamctcrs of Go 

We need to use {91, ...,^?*,} the unique values 
of 9i in order to sampling ax and where 

9* = (A*, 7*). Wc sample ax as ax\9 ^ T{bx + 

n*,cx + X)j=i ^^"i independent we sample 

ax as 0^16* ~ r{b^ + n*,c^ + 7^)- 

In order to estimate random posterior realiza- 
tions of H{x\9) we need to consider the origi- 
nal definition of the DP (see for example Milo- 
van K., Athanasios K. and Draper D. (2008)). 
Based on Antoniak C. E. (1974) the joint pos- 
terior for G, 6 — {9i,...,9n) and the hyperpa- 
rameters {a,ax,a^) is given by 



p{G, 0, a, ax, a^\x) = p(G\6, a, ax, a^\x)p{0, a, ax, a^\x) 

we have that p(G\9,a,ax,a'y\x) is a DP with 
precision parameter a = a + n and base CDF 

Go{t) = ^Gom + i^j:tiko.,oo){t). 

Because we have b = 1,..,B posterior samples 



of {9^ , a'' , a''^, a'^) a sample of can be 
- obtained from p{G\6^,a'',a\,a^\x). Therefore 
consider a grilled of points ti < t2-.- < Im 
on the positive real line then we can 
draw from the posterior distribution of 
{G{ti),...,G(tM)} based on the DP defini- 
tion {G{t^),G{h) - G(ii),...,l - G{tM)} 
has a Dirichlet distribution with parame- 
ters {Q:Go(ti),a(Go(t2) - Go(ti)),...,a(l - 
Go(tM))}- Hence in each iteration b we obtain 
a sample {9^ ,a\,a^^) therefore a sample 
of G^ we can obtained, {G{tif , ...,G{tMf}. 
Finally we have B posterior realizations and 
we can do posterior inference for the mixture 
H{.; G) and hence of functional, II{u; G). 



Now we are interested in to show the sampling 
for the parameters in the GPD defined in the 
tails of (2). Following Fcrraz F, Gammerman 
D and Freitas H. (2011) we compute the poste- 
rior distribution of u, ^ and a using three steeps 
of the Metropolis Hasting algorithm. The al- 
gorithm is as follow: 

• Sampling ^: proposal transition kernel is 
given by a truncated normal 



C\e ~ iVr, V^)I{-a"/{M - u"), oo) 

(21) 

where is a variance in order to improve 
the mixing. M is the maximum value in 
the sample the acceptance probability is 



^ p{9*,<i>*\x)me+(T''/iM-u^))/ 

' p{9f>, (l)'>\x)^{{^* + a*/{M - u*))/ 




where is the density function of the stan- 
dard normal distribution. 

• Sampling a: If ^(''+1) > then a* is 
sampled from the Gamma distribution 
r((j2('')/14,crVV<,) where K is a variance 
in order to improve the mixing. On the 
other hand if ^(''+1) < then a* is sam- 
pled from a truncated normal 

u*\a^ ~ A^(a^V;)/(-^(''+l)(M 

(22) 
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the acceptance probabilities are respec- 
tively: 



1 'p(6»^0''|a;)$((CT*+C(''+l)(M-u^')/yK^) J 



• The threshold u* is sampled following the 
requirement of the lower truncation for 
the GPD. Therefore v* is sampled using 
a truncated normal density 

CT*|a^-7V(M^K)/(a(''+l^oo) (23) 

If ^(^+1) > then 0^''+^) is the minimum 
value at the sample in the iteration b + 
1 otherwise if < a^^+'^'> = M + 

cr(f'+i)/^(6+i). The acceptance probability 
is then 



and 
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